Querying Linguistic Trees
نویسندگان
چکیده
Large databases of linguistic annotations are used for testing linguistic hypotheses and for training language processing models. These linguistic annotations are often syntactic or prosodic in nature, and have a hierarchical structure. Query languages are used to select particular structures of interest, or to project out large slices of a corpus for external analysis. Existing languages suffer from a variety of problems in the areas of expressiveness, efficiency, and naturalness for linguistic query. We describe the domain of linguistic trees and discuss the expressive requirements for a query language. Then we present a language that can express a wide range of queries over these trees, and show that the language is first-order complete over trees.
منابع مشابه
Querying Dependency Treebanks in XML
The need for manual editing during construction of a treebank may impose constraints on the representation of dependency trees which are not optimal for linguistic exploration. Using XML-technology it is possible to maintain the treebank both in a form suitable for editing and in a form suitable for linguistic exploration. By choosing a compact representation, we can use XPath directly as query...
متن کاملUsing MONA for Querying Linguistic Treebanks
MONA is an automata toolkit providing a compiler for compiling formulae of monadic second order logic on strings or trees into string automata or tree automata. In this paper, we evaluate the option of using MONA as a treebank query tool. Unfortunately, we find that MONA is not an option. There are several reasons why the main being unsustainable query answer times. If the treebank contains lar...
متن کاملImplementing Linguistic Query Languages Using LoToS
A linguistic database is a collection of texts where sentences and words are annotated with linguistic information, such as part of speech, morphology, and syntactic sentence structure. While early linguistic databases focused on word annotations, and later also on parse-trees of sentences (so-called treebanks), the recent years have seen a growing interest in richly annotated corpora of histor...
متن کاملA Data Model for Fuzzy Linguistic Databases with Flexible Querying
Information to be stored in databases is often fuzzy. Two important issues in research in this field are the representation of fuzzy information in a database and the provision of flexibility in database querying, especially via including linguistic terms in human-oriented queries and returning results with matching degrees. Fuzzy linguistic logic programming (FLLP), where truth values are ling...
متن کاملIdentifying complex phenomena in a corpus via a treebank lens
While syntactically annotated corpora known as treebanks have been available for many years, along with a variety of customized tools for querying these annotations, the mapping from actual annotations to relevant syntactic or semantic phenomena has been obscured by the coarse-grained labelling of nodes in the parse trees which make up the treebanks. This lack of linguistic detail has hampered ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Logic, Language and Information
دوره 19 شماره
صفحات -
تاریخ انتشار 2010